Bengali and Hindi to English CLIR Evaluation

نویسندگان

  • Debasis Mandal
  • Sandipan Dandapat
  • Mayank Gupta
  • Pratyush Banerjee
  • Sudeshna Sarkar
چکیده

Our participation in CLEF 2007 consisted of two Cross-lingual and one monolingual text retrieval in the Ad-hoc bilingual track. The cross-language task includes the retrieval of English documents in response to queries in two Indian languages, Hindi and Bengali. The Hindi and Bengali queries were first processed using a morphological analyzer (Bengali), a stemmer (Hindi) and a set of 200 Hindi and 273 Bengali stop words. The refined hindi queries were then looked into the Hindi-English bilingual lexicon, ‘Shabdanjali’ (approx. 26K Hindi words) and all of the corresponding translations were considered for the equivalent English query generation, if a match was found. Rest of the query words were transliterated using the ITRANS scheme. For the Bengali query, we had to depend mostly on the translietrations due to the lack of any effective Bengali-English bilingual lexicon. The final equivalent English query was then fed into the Lucene Search engine for the monolingual retrieval of the English documents. The CLEF evaluations suggested the need for a rich bilingual lexicon, a good Named Entity Recognizer and a better transliterator for CLIR involving Indian languages. The best MAP values for Bengali and Hindi CLIR for our experiment were 7.26 and 4.77 which are 0.20 and 0.13 of our monolingual retrieval, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bengali, Hindi and Telugu to English Ad-hoc Bilingual Task at CLEF 2007

This paper presents the experiments carried out at Jadavpur University as part of participation in the CLEF 2007 ad-hoc bilingual task. This is our first participation in the CLEF evaluation task and we have considered Bengali, Hindi and Telugu as query languages for the retrieval from English document collection. We have discussed our Bengali, Hindi and Telugu to English CLIR system as part of...

متن کامل

Bengali and Hindi to English Cross-language Text Retrieval under Limited Resources

This paper describes our experiment on two cross-lingual and one monolingual English text retrievals at CLEF in the ad-hoc track. The cross-language task includes the retrieval of English documents in response to queries in two most widely spoken Indian languages, Hindi and Bengali. For our experiment, we had access to a HindiEnglish bilingual lexicon, ’Shabdanjali’, consisting of approx. 26K H...

متن کامل

Improving Performance Of English-Hindi Cross Language Information Retrieval Using Transliteration Of Query Terms

The main issue in Cross Language Information Retrieval (CLIR) is the poor performance of retrieval in terms of average precision when compared to monolingual retrieval performance. The main reasons behind poor performance of CLIR are mismatching of query terms, lexical ambiguity and un-translated query terms. The existing problems of CLIR are needed to be addressed in order to increase the perf...

متن کامل

IIIT Hyderabad’s CLIR experiments for FIRE-2008

This paper discourses our CLIR experiments performed for the FIRE workshop. We had submitted our runs for Adhoc monolingual document retrieval in Hindi and English, and Ad-hoc cross-lingual document retrieval from Hindi to English, and English to Hindi. In this paper, we describe our English to Hindi and Hindi to English CLIR systems and the experiments conducted on them using the FIRE2008 data...

متن کامل

Leveraging Statistical Transliteration for Dictionary-Based English-Bengali CLIR of OCR'd Text

This paper describes experiments with transliteration of out-of-vocabulary English terms into Bengali to improve the effectiveness of English-Bengali Cross-Language Information Retrieval. We use a statistical translation model as a basis for transliteration, and present evaluation results on the FIRE 2011 RISOT Bengali test collection. Incorporating transliteration is shown to substantially and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007